2014-08-10 Progress Report
Protein Design
Like the Slovenian iGEM entry from 2008, I decided to use flagellin as my antigenic protein.
The Slovenian team were attempting to elicit a humoral response to H. pylori flagellin — a protein that normally evades the mammalian immune system — by creating a hybrid flagellin that is part E. coli and part H. pylori. This would then allow for the creation of a vaccine against H. pylori. At least half the world's population is infected with H. pylori, famously discovered as the cause of stomach ulcers.
Flagellin is about 400 amino acids long, which turns out to be helpful since it's still a little complex to synthesize anything longer than 2000 nucleotides. For example, IDT's gBlocks, which are very inexpensive and commonly used, max out at 2000 nucleotides. It's not particularly difficult to combine fragments to make longer genes, but it's also not off-the-shelf.
E. coli flagellin is highly immunogenic, as you might expect from a common and abundant bacterial protein. There's even a specific innate immune receptor, Toll-like receptor 5, that recognizes flagellins.
So as a short-ish protein that evokes a strong immune response, flagellin is a reasonable choice for my experiment.
Codon optimization
For most amino acids, there is more than one corresponding codon. These codons vary in their efficiency of translation, and different organisms prefer different codons. The difference in the total amount of protein you get can be vast, so to maximize expression of my protein, I need to choose my codons carefully. There are many tools to help with this problem, including tools from IDT, DNA 2.0 and GenScript.
Apart from choosing the most highly expressed codons for your organism, the major thing these methods do is avoid motifs that lead to unwanted secondary effects, such as restriction sites, rho-independent termination or internal translation (IRES).
Next steps
It seems like it should be obvious, but I am still unsure which restriction sites I need to avoid in my protein sequence. E. coli methylates its genome to protect itself from self-cleaving and plasmid DNA is also apparently methylated, so the plasmid should be safe from E. coli restriction enzymes. However, I don't know which restriction enzyme will be used to clone my protein into its vector, so for now I am assuming I need to avoid any restriction sites contained in the multiple cloning site (MCS) of the vector.